摘要 :
In order to speed-up query processing in the context of Data Warehouse Systems, auxiliary summaries, such as materialized views and calculated attributes, are built on top of the data warehouse relations. As changes are made to th...
展开
In order to speed-up query processing in the context of Data Warehouse Systems, auxiliary summaries, such as materialized views and calculated attributes, are built on top of the data warehouse relations. As changes are made to the data warehouse through maintenance transactions, summary data become stale, unless the refresh of summary data is characterized by an expensive cost. The challenge gets even worst when near real-time environments are considered, even with respect to emerging Big Data features. In this paper, inspired by the well-known Lambda architecture, we introduce a novel approach for effectively and efficiently supporting data warehouse maintenance processes in the context of near real-time OLAP scenarios, making use of so-called big summary data, and we assess it via an empirical study that stresses the complexity of such OLAP scenarios via using the popular TPC-H benchmark.
收起
摘要 :
In order to speed-up query processing in the context of Data Warehouse Systems, auxiliary summaries, such as materialized views and calculated attributes, are built on top of the data warehouse relations. As changes are made to th...
展开
In order to speed-up query processing in the context of Data Warehouse Systems, auxiliary summaries, such as materialized views and calculated attributes, are built on top of the data warehouse relations. As changes are made to the data warehouse through maintenance transactions, summary data become stale, unless the refresh of summary data is characterized by an expensive cost. The challenge gets even worst when near real-time environments are considered, even with respect to emerging Big Data features. In this paper, inspired by the well-known Lambda architecture, we introduce a novel approach for effectively and efficiently supporting data warehouse maintenance processes in the context of near real-time OLAP scenarios, making use of so-called big summary data, and we assess it via an empirical study that stresses the complexity of such OLAP scenarios via using the popular TPC-H benchmark.
收起
摘要 :
The recognition that data is of big economic value and the significant hardware achievements in low cost data storage, high-speed networks and high performance parallel computing, foster new research directions on large-scale know...
展开
The recognition that data is of big economic value and the significant hardware achievements in low cost data storage, high-speed networks and high performance parallel computing, foster new research directions on large-scale knowledge discovery from big sequence data-bases. There are many applications involving sequence databases, such as customer shopping sequences, web clickstreams, and biological sequences. All these applications are concerned by the big data problem. There is no doubt that fast mining of billions of sequences is a challenge. However, due to the non availability of big data sets, it is not possible to assess knowledge discovery algorithms over big sequence databases. For both privacy and security concerns, Companies do not disclose their data. In the other hand, existing synthetic sequence generators are not up to the big data challenge. In this paper, first we propose a formal and scalable approach for Parallel Generation of Big Synthetic Sequence Databases. Based on Whitney numbers, the underlying Parallel Sequence Generator (i) creates billions of distinct sequences in parallel and (ii) ensures that injected sequential patterns satisfy user-specified sequences' characteristics. Second, we report a scalability and scale-out performance study of the Parallel Sequence Generator, for various sequence databases' sizes and various number of Sequence Generators in a shared-nothing cluster of nodes.
收起
摘要 :
The recognition that data is of big economic value and the significant hardware achievements in low cost data storage, high-speed networks and high performance parallel computing, foster new research directions on large-scale know...
展开
The recognition that data is of big economic value and the significant hardware achievements in low cost data storage, high-speed networks and high performance parallel computing, foster new research directions on large-scale knowledge discovery from big sequence databases. There are many applications involving sequence databases, such as customer shopping sequences, web clickstreams, and biological sequences. All these applications are concerned by the big data problem. There is no doubt that fast mining of billions of sequences is a challenge. However, due to the non availability of big data sets, it is not possible to assess knowledge discovery algorithms over big sequence databases. For both privacy and security concerns, Companies do not disclose their data. In the other hand, existing synthetic sequence generators are not up to the big data challenge. In this paper, first we propose a formal and scalable approach for Parallel Generation of Big Synthetic Sequence Databases. Based on Whitney numbers, the underlying Parallel Sequence Generator (ⅰ) creates billions of distinct sequences in parallel and (ⅱ) ensures that injected sequential patterns satisfy user-specified sequences' characteristics. Second, we report a scalability and scale-out performance study of the Parallel Sequence Generator, for various sequence databases' sizes and various number of Sequence Generators in a shared-nothing cluster of nodes.
收起
摘要 :
In this paper, we provide three authoritative application scenarios of TPC-H*d. The latter is a suitable transformation of TPC-H benchmark. The three application scenarios are (i) OLAP cube calculus on top of columnar relational D...
展开
In this paper, we provide three authoritative application scenarios of TPC-H*d. The latter is a suitable transformation of TPC-H benchmark. The three application scenarios are (i) OLAP cube calculus on top of columnar relational DBMS, (ii) parallel OLAP data cube processing and (iii) virtual OLAP data cube design. We assess the effectiveness and the efficiency of our proposal, using open source systems, namely, Mondrian ROLAP server and its OLAP4j driver, MySQL-row oriented relational database management system and MonetDB-a column-oriented relational database management system.
收起
摘要 :
In this paper, we provide three authoritative application scenarios of TPC-H*d. The latter is a suitable transformation of TPC-H benchmark. The three application scenarios are (i) OLAP cube calculus on top of columnar relational D...
展开
In this paper, we provide three authoritative application scenarios of TPC-H*d. The latter is a suitable transformation of TPC-H benchmark. The three application scenarios are (i) OLAP cube calculus on top of columnar relational DBMS, (ii) parallel OLAP data cube processing and (iii) virtual OLAP data cube design. We assess the effectiveness and the efficiency of our proposal, using open source systems, namely, Mondrian ROLAP server and its OLAP4j driver, MySQL-row oriented relational database management system and MonetDB-a column-oriented relational database management system.
收起
摘要 :
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmar...
展开
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmark TPC-H~*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian.
收起
摘要 :
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmar...
展开
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian.
收起
摘要 :
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP~*, along with the associated benchma...
展开
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP~*, along with the associated benchmark TPC-H~*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian.
收起
摘要 :
Smart farming and IoT technologies open up a new research agenda, which relates to different inter-related scopes within a Farm Management Information System, such as robots’ programming, tasks’ scheduling, sensor data capture, ...
展开
Smart farming and IoT technologies open up a new research agenda, which relates to different inter-related scopes within a Farm Management Information System, such as robots’ programming, tasks’ scheduling, sensor data capture, management and processing at different layers of the IoT ecosystem. Many research works address these topics, but to the best of our knowledge none has contributed with a fully-featured architecture design of monitoring and scheduling of autonomous agricultural robots. In this paper, we propose the skeleton of architecture for such kind of IoT systems, called LambdAgrIoT. It is designed to support big data and different types of workload (real-time, near real-time, analytic, and transactional). We present the main features of each layer, and the implementation details and its deployment of the Data Source and Speed layers in a real environment. The paper also discusses the open issues related to the other layers and the deployment of the overall architecture at large scale.
收起